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(57) Abstract: The present invention relates to a 
multi-processor computer system comprising - at least 
two processors for parallel execution of processes, 

- at least two cache memory units, each being 
associated with and connected to a separate processor, 

- a connection bus connecting said processors and 
said cache memory units, and- a process list unit 
connected to said connection line for storing a 
process list of processes to be available for execution 
by said processors. In order to enable power saving 
if no processes for execution are available while 
guaranteeing a fast wake-up procedure if such 
processes are available it is proposed according to 
the present invention that said processors are adapted 
for loading a global wake-up variable signalling 
process additions of processes to said process list into 
their associated cache memory unit, for switching 
into a low-power mode if said process list contains 
no process for execution by said processors and 
for switching into a normal-power mode if said 
wake-up variable signals an addition of a process 
to said process list Thus, according to the present 
invention the cache coherence protocol is used for 
communicating and signalling the availability of 
processes for execution. 
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Multi-processor computer system 



The present invention relates to a multi-processor computer system comprising 
at least two processors for parallel execution of processes, 
at least two cache memory units, each being associated with and connected to 

a separate processor, 

5 - a connection bus connecting said processors and said cache memory xmits, and 

a process list unit connected to said connection line for storing a process list of 
processes to be available for execution by said processors. 

Further, the present invention relates to a corresponding processor, a method 
of scheduling the execution of processes aad a method of executing the process by a 

1 0 processor in such a multi-processor computer system. Still further, the present iuvention 
relates to a computer program for implementing said methods. 

Multi-processor computer systems execute multiple processes ia parallel. Batch 
processor repeatedly selects a process that is ready for execution and executes it vmtil the 
process blocks or, in the case of pre-emptive scheduling, the time slice of the running process 

15 expires. When there is no process ready for selection by a processor or, particularly, its 

associated scheduler, the processor or its scheduler, respectively, waits in a spin loop until a 
ready process which is ready for execution becomes available in the process list. A ready 
process becomes available by an unblocking operation, e.g. a V sems^hore operation, 
executed by a process running on another processor. 

20 In order to save power consumption, it is preferred to let the processor switch 

to a low-power or sleep mode rattier than letting it spin until a ready process becomes 
available. However, it is important that other processors can wake-up sleeping processors 
without a large overhead. The standard way to wake-up a processor out of the sleeping mode 
is to send an interrupt to it. The overhead of this method can be large for many parallel 

25 appUcations that have a fine-grain synchronization. 



computer system, a corresponding processor, a method of scheduling the execution of 
processes and a method of executing a process by a processor therein which provide a fast 
and efficient way of executing processes wherein processors can be switched between a low- 



It is therefore an object of the present invention to provide a multi-processor 
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power mode and a normal-power mode within a very short time and without a large 
ov^head. 

This object is achieved according to the present invention by a multi-processor 
computer S3rstem as claimed in claim 1, wherein said processors are adapted for loading a 
5 global wake-up variable signalling process additions of processes to said process list into 
their associated cache memory imit, for switching into a low-power mode if said process list 
contains no process for execution by said processors and for switching into a normal-power 
mode if said wake-up variable signals an addition of a process to said process list. 

The present iQvention is based on the idea to use the cache coherence protocol 

10 to wake-up sleeping processors. Cache coherence protocols are designed to communicate 
much faster than interrupts and therefore allow it to wake-up sleeping processors in a very 
efGcient and fast way. A global wake-up variable is introduced according to the invention 
which is held by the cache memory units of the processors. Said wake-up variable signals if a 
process has been added to the process list. If a processor adds a process to the process list this 

1 5 will be immediately signalled via the cache coherence protocol to the cache memory units of 
the processors causing the processors to switch from low-power mode into the normal-power 
mode. 

Preferred embodiments of the invention are defined ia the dependent claims. A 
processor for xxse in such a multi-processor computer system is defined in claim 6. A method 

20 of scheduling the execution of processes is defined in claim 7. A method of executing a 
process by a processor is defined in claim 8. A computer program for implementing said 
methods is defined in claim 9. It should be noted that these devices and methods as well as 
the computer program can be developed fiorther in a similar or identical way as defined in the 
dependent claims of claim 1. 

25 According to a first preferred embodiment as defined in claim 2 switching into 

the normal-power mode of the processors is caused by a change of the wake-up variable due 
to an addition of a process to the process list. A processor adding a process to the process list 
thus simply has to change the wake-up variable, e.g. by executing a store command as 
claimed according to the preferred embodiment of claim 3 and writing any new value into 

30 said variable. This will immediately be signalled to all cache memory units holding said 

wake-up variable causing a switching of the associated processors from low-power mode into 
normal-power mode. 

According to another aspect of the invention the processors are adapted to 
send a request to other processors to drop the wake-up variable from their associate cache 



wo 2004/006097 




:T/IB2003/002849 



memory unit when adding a process to said process list. Also in this way other processors 
will immediately be informed of an addition of a new process to the process list and thus 
switch into the normal-power mode in which they will try to get the process from the process 
list for execution. 

Preferably, an invalidation-based cache coherence protocol is implemented in 
the multi-processor computer system according to the invention. This means that on a read 
command from a memory unit other cache memory units are checked to see whether they 
contain a more up to date version of the data than is in the memory unit. If this is the case the 
processor holding the more up to date version of the data provides it to the memory. On a 
write command to data in a cache memory unit, other processors are checked to see whether 
they cache the same data item. If this is the case, they should invalidate the data item, i.e. 
remove it from their cache memory unit Regarding more details of cache coherence 
protocols, and, in particular, invalidation-based cache coherence protocols reference is made 
to John L. Hennessy and David A. Patterson, "Computer architecture, a quantitative 
approach", Morgan Kaufinan Publishers, second edition, in particular chsqpter 8.3. 

The invention will now be explained in more detail with reference to the 
drawings in which 

Fig. 1 shows a block diagram of a known multi-processor computer system, 
Fig. 2 shows a flow chart of known method of scheduling the execution of 

processes. 

Fig. 3 shows a flow chart of a method of scheduling the execution of 
processes according to the invention. 

Fig. 4 shows a flow chart of the method of adding a process to a process list 
according to the invention, and 

Fig. 5 shows a block diagram of a multi-processor computer system 
according to the invention. 

Fig. 1 shows a block diagram of a known multi-processor computer system. 
Said computer system comprises a number of, in the present embodiment four, processors 1, 
so-called central processing xmits (CPU), to each of which a cache memory imit 2 is 
associated and connected. Further, a shared memory unit 3, for instance a random access 
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memory unit, comprising a list of processes to be executed by seid processors 1 is provided. 
The processors 1 are interconnected via the cache memory units 2 through an interconnection 
line 4, such as a bus, to which the memory unit 3, which may also be regarded as comprisuoig 
a process list unit, are also connected. 



processor computer system is illustrated as flow chart in Fig. 2. The selection of a ready 
process, i.e. a process that is ready for execution by a processor, consists of waitiag until a 
process appears in a list of ready processes called "process list" (step SIO). Multiple 
processors can be waiting for this so that the process list has to protected by a lock since 

10 otherwise it is possible that, before the processor takes the ready process from the list, it is 
taken by another processor (SI 1). The ready process is then taken from the list in step S12, 
whereafter the process list is unlocked again for access of other processors which are trying 
to get processes for execution (SI 3). In case the processor was successful in getting a process 
for execution from ttie process list (S14) it will execute this process, while in the negative 

15 case it returns to ttie beginning where it is set into the state of trying to get a process from the 
process list. The processors that are currently not executing a process are therefore 
continuously checking if the process list is empty (SIO) as a kind of stand-by state or spin 
loop. 



20 processes according to the present invention in the form of a flow chart. According to the 
invention a global variable **wake-up" that is used to signal additions to the process list is 
introduced. It shall be assumed that, for the beginning, a processor is in a normal-power 
mode and looking for a ready process. In a first step S20 the processor loads the cache line 
containing the wake-up variable into its cache memory if it is not already there by use of a 

25 normal load instruction. Next, the processor checks whether the process list is empty (S21). If 
the processor has found a ready process in the process list in step S21, it first locks the 
process list in step S22 to prevent access to said process list by other processors. Next, the 
processor gets the process from the process list (S23), whereafter the process list is unlocked 
again (S24). 

50 If step S23 was successful the processor will execute the taken process (S25). 

The context of that process is restored and the process continues execution. 

If step S23 was not successful, a so-called sleep-while-cached (swc) 
instruction will be executed (S27) with the wake-up variable as parameter. This means that 
the processor switches from its normal-power mode into a low-power mode, i.e. in some kind 



5 



A known method of scheduling the execution of processes in such a multi- 



Fig. 3 shows an embodiment of a method of schedulrag the execution of 



wo 2004/006097 




:T/IB2003/002849 



of sleeping mode, in which it remains as long as the wake-up variable is in its associated 
cache memory unit or, to be more precise, as long as the cache line of its cache memory unit 
holds the wake-up variable. The same swc instruction is executed in case step S21 gives a 
positive results, i.e. if the process list is found empty (S26). 
5 If, as shown in Fig. 4, another processor appends a process to the process list 

for execution it first locks the process list (S30), before it actually appends the process (S31). 
After unlocking the process list again (S32), a store command will be perforaied on the 
wake-up variable, i.e. a new value will be assigned to the wake-up variable (S33). This will 
immediately signal to all processors being in a low-power mode that a new process has been 

1 0 added to the process list and will cause an invalidation of the cache line in the cache memory 
units of such processors which tiien switch back ficom low-power mode to normal-power 
mode and start again with step S20 (see Fig. 3). 

By this metiiod much power can be saved since processors not executing a 
process are not waiting in a spin loop in normal-power mode but are switched into a low- 

15 power mode. However, since according to the present invention a cache coherence protocol 
is used for signalling additions of processes to the process list using said wake-up variable 
held in the cache memory miits of sleeping processors, the wake-up procedure is very fast, in 
particular faster than using interrupts. 

A block diagram of a multi-processor computer system in which the invention 

20 is implemented is shown in Fig. 5. Between the processor 1 and the cache memory unit 2 

additional communication lines 7, 8 besides the normal data path 6 are added according to the 
present invention. Commimication Une 7 is used to communicate a wake-up address from the 
processor 1 to the cache memory unit 2, i.e. to pass the address specified by the swc 
instruction to the cache memory unit 2. Conmiunication line 8 is used to communicate a 

25 wake-up signal from the cache memory unit 2 to the processor 1 to cause it to switch from 
low-power mode to normal-power mode, when the specified address disappears from the 
cache memory unit. 

If, in step S33 of Fig. 4, a processor stores an arbitrary value to the wake-up 
variable which is a normal store instruction the following happens. If another processor is 

50 looking for a ready process then that processor has the wake-up variable in its cache memory 
unit, meaning that the cache contains the cache line that corresponds to the memory block in 
which the wakeup variable is stored. If another processor is caching the wake-up variable 
then the processor intending to store an arbitrary value to the wake-up variable can not 
modify the wake-up variable by means of such a store instruction since the cache coherence 
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protocol prevents this. In order to do the store operation the processor sends out a broadcast 
to all other processor with a request to drop the wake-up variable from their cache memory 
unit In tenns of the cache coherence protocol, in particular of the MSI, MESI or MOESI 
type, the processor makes a transition from shared or invalid to modified state. This causes 
processors that were sleeping after an swc instruction to wake-up and switch into the normal- 
power mode. These processors will then check the process list, and one of them will be 
successfiil in getting the just added process. The others will switch back into the low-power 
mode according to the swc instruction. 



get a process from the process Ust Doing this in the reverse order might lead to the situation 
that the processor switches to the low-power mode while there is a ready process in the 
process list. 



where fast synchronisation between processors is required. While many processors have 
instructions to switch to low-power sleep mode there is no processor and no multi-processor 
computer system known that is able to wake-up and switch into the normal-power mode 
because of cache coherence transactions as proposed according to the present invention 
which provides a very fast and effective solution. 



It should be noted that a processor loads the wake-up variable before trying to 



Besides for saving power in the process scheduler the swc instruction 
according to the present invention as explained above could be useful for other purposes 
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CLAIMS: 



1 . Multi-processor computer system comprising 

at least two processors for parallel execution of processes, 
at least two cache memory units, each being associated with and connected to 
a separate processor, 

5 - a coimection bus connecting said processors and said cache memory units, and 

a process list unit connected to said coimection line for storing a process list of 
processes to be available for execution by said processors, 

wherein said processors are adapted for loading a global wake-up variable signalling process 
additions of processes to said process list into their associated cache memory unit, for 
10 switching into a low-power mode if said process list contains no process for execution by 
said processors and for switching into a normal-power mode if said wake-up variable signals 
an addition of a process to said process list. 

2. Multi-processor computer system as claimed in claim 1, 

15 wherein said processors are adapted to switch into the normal-power mode if the wake-up 
variable held in the associated cache memory units is changed due to an addition of a process 
to said process Ust. 

3. Multi-processor computer system as claimed in claim 1, 

20 wherein said processors are adapted to execute a store command on the wake-up variable 
when adding a process to said process list. 

4. Multi-processor computer system as claimed in claim 1 , 

wherein said processors are adapted to send a request to other processors to drop the wake-up 
25 variable from their associated cache memory unit when adding a process to said process list. 

5. Multi-processor computer system as claimed in claim 1 » 

wherein said computer system is adapted for implementing an invalidation based cache 
coherence protocol. 
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6. Processor for use in a multi-processor computer system comprising 

at least two processors for parallel execution of processes, 
at least two cache memory units, each being associated with and connected to 
5 a separate processor, 

a connection bus connecting said processors and said cache memory imits, and 
a process list unit connected to said connection Hne for storing a process list of 
processes to be available for execution by said processors, 

wherein said processor is adapted for loading a global wake-up variable signalling process 
10 additions of processes to said process list into its associated cache memory unit, for switching 
into a low-power mode if said process list contaiixs no process for execution by said 
processor and for switching into a normal-power mode if said wake-up variable signals an 
addition of a process to said process list . 



15 7. Method of scheduling the execution of processes in a multi-processor 

computer system comprising 

at least two processors for parallel execution of processes, 
at least two cache memory units, each being associated with and coimected to 
a separate processor, 

20 - a connection bus connecting said processors and said cache memory units, and 

a process list unit connected to said connection line for storing a process list of 
processes to be available for execution by said processors, 
said method comprising the steps of : 

loading a global wake-up variable signalling process additions of processes to 
25 said process list by a processor into its associated cache memory imit, 
adding a process to said process list, and 

changing the wake-up variable signalling said addition of a process to said 
process Ust thus causing said processor to switch from a low-power mode into a normal- 
power mode. 



30 



8. Method of executing a process by a processor ia a multi-processor computer 

system comprising 

at least two processors for parallel execution of processes, 

at least two cache memory units, each being associated with and coimected to 
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a separate processor, 

a coimection bus coimeoting said processors and said cache memory units, and 

a process list unit connected to said coimection line for storing a process list of 
processes to be available for execution by said processors, 
said method comprising the steps of : 

loading a global wake-up variable signalling process additions of processes to 
said process list into an associated cache memory unit, 

switching into a low-power mode if said process Ust contains no process for 
execution by said processor, 

switching into a normal-power mode if said wake-up variable signals an 
addition of a process to said process list, and 

accessing said process list to get said added process for execution. 

9. Computer program comprising computer program code means for causing a 

computer to perform the steps of the method as claimed in claim 7 or 8 if said methods are 
executed by said computer. 
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